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Abstract — Recovering intrinsic data structure from corrupted 
observations plays an important role in various tasks in the 
communities of machine learning and signal processing. In this 
paper, we propose a novel model, named log-sum heuristic 
recovery (LHR), to learn the essential low-rank structure from 
corrupted data. Different from traditional approaches, which 
directly utilize norm to measure the sparseness, LHR intro- 
duces a more reasonable log-sum measurement to enhance the 
sparsity in both the intrinsic low-rank structure and in the sparse 
corruptions. Although the proposed LHR optimization is no 
longer convex, it still can be effectively solved by a majorization- 
minimization (MM) type algorithm, with which the non-convex 
objective function is iteratively replaced by its convex surrogate 
and LHR finally falls into the general framework of reweighed 
approaches. We prove that the MM-type algorithm can converge 
to a stationary point after successive iteration. We test the 
performance of our proposed model by applying it to solve two 
typical problems: robust principal component analysis (RPCA) 
and low-rank representation (LRR). For RPCA, we compare 
LHR with the benchmark Principal Component Pursuit (PCP) 
method from both the perspectives of simulations and practical 
applications. For LRR, we apply LHR to compute the low- 
rank representation matrix for motion segmentation and stock 
clustering. Experimental results on low rank structure learning 
demonstrate that the proposed Log-sum based model performs 
much better than the -based method on for data with higher 
rank and with denser corruptions. 

Index Terms — Log-sum heuristic, compressive sensing, sparse 
optimization, matrix learning, nuclear norm minimization. 



I. Introduction 

Learning the intrinsic data structures via matrix analysis 
|[T||l2l has received wide attention in many fields, e.g., neural 
network|[3l, learning system|i4J|i5J, control theory |6|, com- 
puter vision QIU and pattern recognition [91|10|. There 
are quite a number of efficient mathematical tools for rank 
analysis, e.g., Principal Component Analysis (PCA) and Sin- 
gular Value Decomposition (SVD). However, these typical 
approaches could only handle some preliminary and simple 
problems. With the recent progresses of compressive sensing 
ifm . a new concept on nuclear norm optimization has emerged 
into the field of rank minimization lfT2l and has led to a 
number of interesting applications, e.g. low rank structure 
learning (LRSL) from corruptions. 
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LRSL is a general model for many practical problems in the 
communities of machine learning and signal processing, which 
considers learning a data of the low rank structure from sparse 
errors |13| |14||15||16|. Such problem can be formulated as: 
P — /(A) + (7(E), where P is the corrupted matrix observed 
in practical world; A and E are low-rank matrix and sparse 
corruption, respectively and the functions /(■) and g{-) are 
both linear mappings. Recovering two variables (i.e., A and 
E) just from one equation is an ill-posed problem but is still 
possible to be addressed by optimizing: 



(PO) min ranA:(A) + AIIEIL 

(A,E) ' 

s.t. P = /(A)+5(E). 



(1) 



In (PO), rank{A) is adopted to describe the low-rank structure 
of matrix A, and the sparse errors are penalized via ||E||£q, 
where £q norm counts the number of all the non-zero entries in 
a matrix. (PO) is always referred as sparse optimization since 
rank term and £0 norm are sparse measurements for matrices 
and vectors, respectively. However, such sparse optimization 
is of little use due to the discrete nature of (PO) and the exact 
solution to it requires an intractable combinatorial search. 

A common approach that makes (PO) trackable tries to 
minimize its convex envelope, where the rank of a matrix 
is replaced by the nuclear norm and the sparse errors are 
penalized via ii norm, which are convex envelopes for rank{-) 
and II • Wig, respectively. Although this framework is devel- 
oped on two different norms, essentially, it is based on £1 
heuristic since nuclear norm can be regarded as a specific 
case of £1 norm fTl]. In practical applications, LRSL via 
£1 heuristic is powerful enough for many learning tasks with 
relative low rank structure and sparse corruptions. However, 
when the desired matrix becomes complicated, e.g., it has 
high intrinsic rank structure or the corrupted errors become 
dense, the convex £1 heuristic approaches may not achieve 
promising performances. In order to handle those tough tasks 
via LRSL, in this paper we take the advantages of non-convex 
approximation, rather than the convex £1 heuristic, to better 
enhance the sparseness of signals. 

We propose log-sum heuristic recovery (LHR) to use log- 
sum term as the basic sparse heuristic functionality for sparse 
optimization in (PO). There are mainly two reasons that we 
use such a non-convex term to conduct LRSL. The main 
reason is due to its sparseness. It is indicated in [18 1 that 
log-sum term is a closer approximation to the £0 norm than 
the £1 norm for sparse vector representation. Therefore, it 
naturally inspires us to generalize its advantages from vector 
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recovery to matrix learning. Moreover, although the objective 
function derived from the log-sum term is non-convex, it is 
also possible to solve it by convex optimizations because the 
convex surrogates of log-sum term can be well defined by 
Taylor expansion. We will introduce an effective non-convex 
optimization strategy called majorization-minimization{MM) 
IIlllllOl to solve it next. 

MM algorithm is implemented in an iterative way that it 
first replaces the non-convex component of the objective with 
a convex upper-bound and then to minimize the convex upper- 
bound, which exactly makes the non-convex problem fall into 
the general paradigm of the reweighted schemes. Accordingly, 
it is possible to solve the non-convex optimization following a 
sequence of convex optimizations and we will prove that with 
the MM framework, LHR finally converges to a stationary 
point after successive iterations. 

LHR is a general paradigm for LRSL and we will adapt 
it to two specific models for practical applications. In a 
nutshell, LHR will be used to solve the problems of low 
rank matrix recovery (LRMR) and low rank representation 
(LRR). In LRMR, LHR is used to recover a low rank matrix 
from sparse corruptions and its performance will be compared 
with the benchmark Principle Component Pursuit (PCP) |T3l 
method. In practice, our approach often performs very well in 
spite of its simplicity. By numerical simulations, LHR could 
handle many tough tasks that typical algorithm fails to handle. 
Moreover, the feasible region of LHR is much larger than PCP, 
which implies that it could deal with much denser corruptions 
and exhibits much higher rank tolerance. The feasible region 
of PCP subjects to the boundai-y of r;^'^^ + ^^^^ = 0.35, 
where ij and ^ are rank rate and error rate, respectively. 
With the proposed LHR model, the feasible boundary can be 
extended to ij^hb. _^^lhr _ q 53 'pjjg advancements are also 
verified on two practical applications of shadow removal on 
face images and video background modeling. 

In the second task of low rank representation, the power of 
LHR model will be generalized to low rank representation for 
subspace clustering (SC), the goal of which aims at recovering 
the underlying low rank correlation of subspaces in spite of 
noisy disturbances. In order to judge the performances, we 
will first apply it to motion segmentation in video sequences, 
which is a benchmark test for SC algorithms. Besides, in order 
to highlight the robustness of LHR to noises and disturbances, 
we apply LHR to stock clustering that is to determine a stock's 
industrial category given its historical price record. From 
both the experiments, LHR gains higher clustering accuracy 
than other state-of-the-art algorithms and the improvements 
are especially noticeable on the stock data which includes 
significant disturbances. 

The contributions of this work are three-folds: 

• This work presents a log-sum heuristic recovery (LHR) 
algorithm to handle the typical LRSL problem with an 
enhanced sparsity term. We introduce a majorization- 
minimization algorithm to solve the non-convex LHR 
optimization with reweighted schemes and theoretical 
justifications are provided to prove that the proposed 
algorithm converges to a stationary point. 

• The proposed LHR model extends the feasible region of 



existing £1 norm based LSRL algorithm, which implies 
that it could successfully handle more learning tasks with 
denser corruptions and higher rank. 
• We apply the LHR model to a new task of stock clustering 
which serves to demonstrate that low rank structure 
learning is not only a powerful tool restricted in the areas 
of image and vision analysis, but also can be applied to 
solve the profitable financial problems. 
The remainder of this paper is organized as follows. We 
review previous works in Section [III Section |lll| introduces 
the general LHR model and discusses how to solve the non- 
convex LHR by MM algorithm. We addresses the low rank 
matrix recovery (LRMR) problem and compare LHR model 
with PCP from both the simulations and practical applications 
in Section |lVl The LHR model for low rank representation 
and subspace segmentation is discussed in Section |V] Section 
I VII concludes this paper. 

II. Previous works 

In this part, we review some related works from the fol- 
lowing perspectives. First, we discuss two famous models 
in LRSL, i.e.. Low Rank Matrix Recovery (LRMR) from 
corruptions and low rank representation (LRR). Then, some 
previous works about Majorization-Minimization algorithm 
and reweighted approaches are presented. 

A. Low rank structure learning 

1) Low rank matrix recovery: Corrupted matrix recovery 
considers decomposing a low rank matrix from sparse cor- 
ruptions which can be formulated as P = A + E, where 
A is a low rank matrix, E is the sparse error and P is the 
observed data from real world devices, e.g. cameras, sensors 
and other equipments. The rank of P is not low, in most 
scenarios, due to the disturbances of E. How can we recover 
the low rank structure of the matrix from gross errors? This 
interesting topic has been discussed in a number of works, 
e.g. QSl lO and lUS). Wright et al. proposed the PCP (a.k.a. 
RPCA) to minimize the nuclear norm of a matrix by penalizing 
the ii norm of errors iHHl . PCP could exactly recover the low 
rank matrix from sparse corruptions. In some recent works, 
Ganesh et al. investigated the parameter choosing strategy for 
PCP from both the theoretical justifications and simulations 
fJT). In this work, we will introduce the reweighted schemes 
to further improve the performances of PCP. Our algorithm 
could exactly recover a corrupted matrix from much denser 
errors and higher rank. 

2) Low rank representation: Low rank representation||5l is 
a robust tool for subspace clustering [22], the desired task of 
which is to classify the mixed data in their corresponding sub- 
spaces/clusters. The general model of LRR can be formulated 
as P = PA + E, where P is the original mixed data, A is 
the affine matrix that reveals the correlations between different 
pairs of data and E is the residual of such a representation. 
In LRR, the affine matrix A is assumed to be low rank and 
E is regarded as sparse corruptions. Compared with existing 
SC algorithms, LRR is much robust to noises and archives 
promising clustering results on public datasets. In this work. 
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inspired by LRR, we will introduce the log-sum recovery 
paradigm to LRR and show that, with the log-sum heuristic, 
its robustness to corruptions can be further improved. 



B. MM algorithm and reweighted approaches 

Majorization-Minimization (MM) algorithm is widely used 
in machine learning and signal processing. It is an effective 
strategy for non-convex problems in which the hard problem 
is solved by optimizing a series of easy surrogates. Therefore, 
most optimizations via MM algorithm fall into the framework 
of reweighted approaches. 

In the field of machine learning, MM algorithm has been 
applied to parameters selection forbayesian classification ||23l . 
In the area of signal processing, MM algorithm leads to a 
number of interesting applications, including wavelet-based 
processing ll24l and total variation (TV) minimization ll25l . 
For compressive sensing, reweighted method was used in 
£i heuristic and led to a number of practical applications 
including portfoUo management ll26l and image processing 
flE]. Reweighted nuclear norm was first discussed in (TT) and 
the convergence of such approach has been proven in |28|. 

Although there are some previous works on reweighted 
approaches for rank-minimization, our approach is quite differ- 
ent. First, this work tries to consider a new problem of low rank 
structure learning from corruptions while not on the single task 
of sparse signal or nuclear norm minimization. Besides, exist- 
ing works on reweighted nuclear norm minimization in 1271 
ll28l are solved by semi-definite programming which could 
only handle the matrix of relative small size. In this paper, we 
will use the first order numerical algorithm (e.g., alternating 
direction method (ADM)) to solve the reweighed problem, 
which can significantly improve the numerical performance. 
Due to the distributed optimization strategy, it is possible 
generalize the learning capabilities to large scale matrices. 



III. Corrupted matrix recovery via log-sum 

HEURISTIC 

In this part, we first discuss the widely used £i -based 
method for LRSL. Then, the log-sum heuristic (LHR) ap- 
proach is proposed and we introduce the MM algorithm to 
solve it. Finally, theoretical justifications are presented to prove 
that LHR can converge to a stationary point by reweighted 
approaches. 



A. £i heuristic for corrupted low rank matrix recovery 

As stated previously, the basic optimization (PO) is non- 
convex and generally impossible to be solved as its solution 
usually requires an intractable combinatorial search. In order 
to make ([T]i it trackable, convex alternatives are widely used in 
a number of works, e.g. llT3l 1141 . Among these approaches, 
one prevalent method tries to replace the rank of a matrix by 
its convex envelope, i.e., the nuclear norm and the sparsity 
is penalized via £i norm. Accordingly, by convex relaxation. 



the problem in (|2]i can actually be recast as a semi-definite 
programming. 



min ||A|U + A||E||,^ 

(A,E) 

S.t. P-/(A)+5(E), 



(2) 



where ||A||* = ^ o'i(^), is the nuclear norm of the matrix 

1=1 

which is defined as the summation of the singular values of A; 
and ||E||£j = is the £i norm of a matrix. Although 

the objective in Q involves two norms: nuclear norm and £i 
norm, its essence is based on the li heuristic. We will verify 
this point with the following lemma. 

Lemma III.l. For a matrix X G M™^", its nuclear norm is 
equivalent to the following optimization: 



IXII 



min UtriY) + trCL)] 
(Y,Z,X) ^ 

' Y X 
X^ z 



s.t. 



^0, 



(3) 



where Y £ W^'^"\Z e M"^" are both symmetric and 
positive definite. The operator tr{-) means the trace of a matrix 
and >^ represents semi-positive definite. 

The proof of Lemma llII.il may refer to l26ll lT2l . According 
to this lemma, we can replace the nuclear norm in (|2]i and 
formulate it in the form of: 

i[tr(Y)+tr(Z)]+A||E|l,^ 

^ ^ 1 ^ n (4) 

A^ Z J - " 
P = /(A)+5(E). 

From Lemma llll.il we know that both Y and Z are symmetric 
and positive definite. Therefore, the trace of Y and Z can 
be expressed as a specific form of li norm, i.e. tr{Y) = 
\\diag(Y)\\i-^. diag{Y) is an operator that only keeps the 
entries on the diagonal position of Y in a vector. Therefore, 
the optimization in (01) can be expressed as: 



mm 

(Y,Z,A,E) 
S.t. 



min ]-{\\diag{Y)\\ii^ 
xeD ^ 



||diag(Z)||,J + A||E|| 



(5) 



where X = {Y, Z, A, E} and 
b {(Y,Z,A,E) 



Y A 

A^ Z 



>r 0,(A,E) e C}. 



(A, E) G C stands for convex constraint. 

B. Log-sum heuristic for matrix recovery 

By Lemma llII.il the convex problem with two norms in 
(O has been successfully converted to an optimization only 
with l\ norm and therefore it is called £i -heuristic. l\ norm 
is the convex envelope of the concave £o norm but a number 
of previous research works have indicated the limitation of 
approximating £q sparsity with £\ norm, e.g., lT8l l |29l . It 
is natural to ask, for example, whether might a different 
alternative not only find a correct solution, but also outperform 
the performance of £\ norm? Next we will introduce the log- 
sum term to represent the sparsity of signals. 



4 



Definition III.2. For any matrix X G R™^", the log-sum term 
is defined as ||X||i = log(|Xij| + S), where 5 > is a 
small regularization constant. 

The prominent reason that we use this term is mainly due to 
its sparsity. As indicated in ifTSl . the log-sum term lies between 
the scope of the norm and norm, which makes it be a 
closer approximation of i?o norm |18|[26| and therefore, it is 
used to encourage the sparsity in the optimization. We propose 
Log-sum Heuristic Recovery (LHR) model H{X): 

(LHR)iJ(X) = min i(||dmg(Y) ||l + \\diag{Z)\\L) + A I|E|| 

(6) 

From the formulation of LHR, obviously, it differs from Q 
only on the selection of the sparse norm, where the later uses 
log-sum term instead of the typical li norm. Although we have 
placed a powerful term to enhance the sparsity in LHR model, 
unfortunately, it also causes non-convexity into the objective 
function. The LHR model is not convex since the log-function 
over M++ = ((5, oo) is concave. In most cases, non-convex 
problem can be extremely hard to solve. Fortunately, the 
convex upper bound of || • ||i can be easily found and defined. 
Therefore, we will introduce the majorization-minimization 
algorithm to solve the LHR optimization. 

C. The majorization-minimization for LHR optimization 

The majorization-minimization (MM) algorithm replaces 
the hard problem by a sequence of easier ones. It proceeds in 
an Expectation Maximization (EM)-like fashion by repeating 
two steps of Majorization and Minimization in an iterative 
way. During the Majorization step, it constructs the convex 
upper bound of the non-convex objective. In the Minimization 
step, it minimizes the upper bound. 

To see how the MM works for LHR, Let's recall the 
objective function in ^ and make some simple algebra 
operations: 

\[\\diag{Y)\\^ + \\dzag{Z)\\^]+\\\n\L 
= 5E, ^og{Y.-n +5) + Y.k log(^fcfc + 5)] 

+ \T.^,^og{\E,,\+5) il) 
= i[logdet(Y + 51^) + logdGt(Z + 

+ AE.,log(|i?.,| + <5) 

where e jgmxm identity matrix. It is well known 

that the concave function is bounded by its first-order Taylor 
expansion. Therefore, we calculate the convex upper bounds 
of all the terms in (|7]l- For the term logdet(Y + Sim), 

logdet(Y + 51^) < logdet(ry + 5I„0 

+ tr[(ry + Ji„)-i(Y-rY)]. 

The inequality in (|8]l holds for any Fy >- 0. Similarly, for any 

{TEh > 0, 

E,, iog(|i?,, I + <5) < E., [iog[(r^,)., + 5] + 

(9) 

We replace each term in (Q with the convex upper bound 
and define T{X\T) as the surrogate function after convex 



relaxation. Therefore, we can instead optimize the following 
problem 

min r(l|f ) = iir[(Fy -|- Sl^r^Y] + ^tr[{T, + Sl,,)-^Z] 
xeD 

+ ^ i^E,, + S)-^E,j + const, 

(10) 

In ([Toll, set X = {Y, Z,A,E}, which contains all the 
variables to be optimized and set F = {Fy, F^, F^} contains 
all the parameter matrices. At the end of ( fTOb . const stands 
^for the constants that are irrelative to {Y, Z, A,E}. In some 
previous works of MM algorithms Ii23l ifTSl ||28l, they denote 
the parameter F in t^*^ iteration with the optimal value of 
X of the last iteration, i.e. F = X* . According to the 
discussions above, we provide the MM algorithm for (LHR) 
minimization in Algorithm [T] Before elaborately discussing 



Algorithm 1: A MM algorithm for LHR minimization 
Initialization : f := 

1 repeat 

Majorization : 

2 f*:=i:*; 

3 Define convex upper bound r(X|f *); 
Minimization: 

4 = argmin.r(X|f*); 

xeb 

5 t :=< + !; 

6 until convergence; 



how to numerically solve the optimization, we wiU first discuss 
some theoretical properties of it. 

D. Theoretical justifications 

In this part, for simplicity, we define the objective function 
in ^ as H{X) and the surrogate function in ( fTOl l is defined 
as T{X\T). X is a set containing all the variables and set 
T records the parameter matrices. The convergence property 
of general MM algorithm was separately distributed on some 
early mathematical journals |30|[31| which is a bit obscure 
and were not generally read by researchers in the community 
of computer science. Besides, previous works on MM con- 
vergence are almost on the variable selection models. In this 
paper, we specify it to our LHR model and try to explain it 
in a plain way. Before discussing the convergence property of 
LHR, we will first provide two lemmas. 

Lemma III.3. If set F* := X^, MM algorithm could mono- 
tonically decrease the non-convex objective function H{X), 
i.e. H{X'+^) < H{X% 

Proof: In order to prove the monotonically decrease 
property, we can instead prove: 

< r(X*+i|f*) < T{X'p) = H{X'). (11) 

We prove (fTTI) by the following three steps: 

(i) The first inequality follows from the argument that 
T{X\t) is the upper-bound of H{X). 
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(ii) The second inequality holds since the MM algorithm 
computes — argminT(X|r*). The function r(-) is 

X 

convex, therefore, is the unique global minimum. This 

property guai-antees that r(X*+^|f*+^) < r(-|f*) with any 
X ^ and T(l*+i|f*+i) = T(-|f*) if and only if 

(iii) The last equality can be easily verified by expanding 
T(X*|r*) and making some simple algebra. The transforma- 
tion is straightforward and omitted here. ■ 

Lemma III.4. Let X = be a sequence 

generated by MM framework in Algorithm\l] after successive 
iterations, such a sequence converges to the same limit point. 

Proof: We give a proof by contradiction. We assume that 
sequence X diverges, which means that lim 7^ 

t^oo 

0. According to the discussions in Appendix IB] we know that 
there exists a convergent subsequence X*-'' converging to i.e. 

lim X*'' = (f) and meanwhile, we can construct another con- 

fc— f 00 

vergent subsequence X**^^ that lim = Lp. We assume 

k^oo 

that (f) ^ ip. Since the convex upper-bound T{-\T) is contin- 

T{ lim X^>'+^ |f*^) < 



uous, we get lim T{X^''+^\T^>' 

k—>-oo 



T(lim X*" |f*^) = lim r(X*'=|f*'= 



The strict less than 



operator "<" holds because Lp ^ cf). See (ii) in the proof 
of Lemma IIII.3I for details. Therefore, it is straightfor- 
ward to get the following inequalities: lim < 

k^oo 

lim T(X**+i|f*'') < lim T(X*'=|f*'=) = lim HiX*"). 

k^oo k-^00 /c— s-oo 

Accordingly, 



lim < lim HiX^" 



(12) 



Besides, it is obvious that the function of H{-) in ^ is 
bounded below, i.e. H{X) > [mn + m + n) log J. Moreover, 
as proved in Lemma UlI. 31 H{X) is monotonically decreasing, 

which guarantee that lim H{X*) exists, i.e. 

t— f 00 



lim HiX^") = lim H{X*) = lim H{X^+^) 
= lim H{X*''+^) 



(13) 



Obviously, (fT3T l contradicts to ( fT2] i. Therefore, (j> — p and we 
get the conclusion that lim - X* = 0. ■ 

Based on the two lemmas proved previously, we can give 
the convergence theorem of the proposed LHR model. 

Theorem III.5. With the MM framework, LHR model finally 
converges to a stationary point. 

Proof: As stated in Lemma IIII.4I the sequences generated 
by MM algorithm converges to a limitation and here we will 
first prove that the convergence is a fixed point. We define the 
mapping from X'' to X'^'^^ as Af(-), and it is straightforward 
to get, lim X* = lim ~ lim M{X*), which imphes 

that lim X* — (f) is a fixed point. In the MM algorithm, when 

>-oo 



constructing the upper-bound, we use the first-order Taylor 
expansion. It is well known that the convex surrogate T{X\T) 
is tangent to H{X) at X by the property of Taylor expansion. 
Accordingly, the gradient vector of T{X\T) and H{X) are 
equal when evaluating at X. Besides, we know that at the 
fixed point, e V^^^r(X|r) and because it is tangent to 
H{X), we can directly get that S V x^^H{X) which proves 
that the convergent fixed point cj) is also a stationary point of 

H{-). m 

In this part, we have shown that with the MM algorithm, 
LHR model could converge to a stationary point. However, it 
is impossible to claim that the converged point is the global 
minimum since the objective function of LHR is not convex. 
Fortunately, with a good starting point, we can always find 
a desirable solution by iterative approaches. In this paper, the 
solution of £1 heuristic model was used as a starting point and 
it could always lead to a satisfactory result. 

E. Solve LHR via reweighted approaches 

To numerically solve the LHR optimization, we remove the 
constants that are irrelative to Y, Z and E in T{X\r) and get 
the new convex objective 



min i[tr (W^Y) 



where Wy(z) = (rY(Z) + «m(n)) and {We^ = 
{Eij + Sy^,Vij. It is worth noting that tr(WyY) = 
ir(Wy YWy). Besides, since both Wy and are positive 
definite, the first constraint in (|6]l is equivalent to 



Wy 











Y 


A 








Z 





Wy 









>- 



Therefore, after convex relaxation, the optimization in ^ 
now subjects to 



s.t. 



i[tr(Wy YWy) + tr{WzZWz)] + X\\We E|| 

WyYWy WyAWz 

(WyAWz)^ WzZWz 



>- 



/(A)+ff(E) 



(14) 



Here, we apply Lemma llII.il to ( fT4l i once again and rewrite 
the optimization in (fl4l i in the form of the summation of the 
nuclear norm and £1 norm. 



min . IIWyAWz 

(A,E) 

S.t. P = /(A) + 



AI|Wb0E|| 



3(E) 



(15) 



In (fTsT l. the operator in the error term denotes the 
component-wise product of two variables, i.e., for W^; and 

E: (We E)y = {WE)tjEij. According to 



we know 

that Y* = UEU"^ and Z* = VI]V^,'if we do singular 
value decomposition for A* = USV^. Accordingly, the 
weight matrix Wy ~ (USU^ + SI„i)^^^^ and matrix 
Wz-(VI]V^+<5I„)-i/2. 

Here, based on MM algorithm, we have converted the 
non-convex LHR optimization to be a sequence of convex 
reweighted problems. We call it reweighted method JTSl l since 
in each iteration we should re-denote the weight matrix set 
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W and use the updated weights to construct the surrogate 
convex function. Besides, the objective in ( fTSl ) is convex with 
a summation of a nuclear norm and a £i norm and can be 
solved by convex optimization. In the next two sections, the 
general LHR model will be adapted to two specific models 
and we will provide the optimization strategies for those two 
models,respectively. 

IV. Low RANK MATRIX RECOVERY FROM CORRUPTIONS 

In this part, we first apply the LHR model to recover a low 
rank matrix from corruption and its performance is compared 
with the widely used Principal Component Pursuit (PCP). 



It is well known (see, for example, ll36l ) that for scalars x and 
y, the unique optimal solution to the problem 



mm a \x\ 



vf 



is given by 

X* = sgn(y)max(|?/| - q;,0) = 



(20) 



(21) 



E* is a solution to the E-minimization if and only if for all 



(22) 



A. Joint optimization for LHR 

Based on the LHR derivations, the corrupted low rank 
matrix recovery problem can be formulated as a reweighted 
problem: 



mm . 

(A,E) 
S.t. P 



WyAWzll 

A + E 



AIIWbOEII 



(16) 



Due to the reweighted weights are placed in the nuclear norm, 
it is impossible to directly get the closed-form solution of the 
nuclear norm minimization. Therefore, inspired by the work 
||5), we introduce another variable J to ( fT6] l by adding another 
equality constraint and to solve. 



min.||J|l^+A||W£;0E|l^. 
s.t. hi = P A E = 
ha = J WyAWz = 



(17) 



Based on the transformation, there is only one single J in 
the nuclear norm of the objective that we can directly get its 
closed-form update rule by ifTTl . There are quite a number of 
methods that can be used to solve it, e.g. with Proximal Gra- 
dient (PG) algorithm |,32j or Alternating Direction Methods 
(ADM) ||33l. In this paper, we will introduce the ADM method 
since it is more effective and efficient. Using the ALM method 
ll34l . it is computationally expedient to relax the equality in 
(flTl l and instead solve: 



L 



< C2,h2 > +f 



A||Wb0E||,^ 

F ^ 



|hi| 



< Ci,hi > 
h2||^) 



(18) 



where <, > is an inner product and Ci and C2 are the 
lagrange multipliers, which can be updated via dual ascend- 
ing method. (fTsl l contains three variables, i.e., J,E and A. 
Accordingly, it is possible to solve problem via a distributed 
optimization strategy called Alternating Direction Method 
(ADM). The convergence of the ADM for convex problems 
has been widely discussed in a number of works |33||35]. 
By ADM, the joint optimization can be minimized by four 
steps as E-minimization, J-minimization, A-minimization and 
dual ascending. We first provide the update rule for E- 
minimization. 



E 



argminAIIWijOEll 

E 



E 



(19) 



Similarly, J-minimization can be solved by 



J = argmin ||J| 
J 



-^||J- WyAWz + /i"^Cif (23) 
2 ^ 



For matrices X,D, previous works, e.g. lfT2l ll37ll . have 
indicted that the unique closed-form optimal solution to the 
problem 



min a IIXI 

X 



^l|X-D||^ 



is given by 

X* = U.s„(E)V^ - rf„(D), 



(24) 



(25) 



where D = UEV^ denotes the singular value decomposition 
of D. From ( |25] l, it is immediate that the unique optimal 
solution to ( |23] l is given by 



J* =rf^-i(WyAWz+M"'C2). 



(26) 



Finally, the solution to A is based on the following optimiza- 
tion problem. 



A* = arg min hi + ^ Ci 

A 



|h2 + ^-iC2|| (27) 



which is only a summation of two F-norms that can be 
addressed by gradient-descending method. Here, we provide 
the update-rule that 

j^k+i ^ A'' +-f[WYih^l+^i-^Cl)Wz + (h^ + /i-iC2)] 

Based on all the previous discussions, we can now give the 
whole framework to solve the LHR model for LRMR via 
reweighted schemes in Algorithmic 



B. Numerical simulations 

We have explained how to recover a low rank matrix via 
LHR in preceding sections. In this section, we will conduct 
some experiments to test its performances with the compar- 
isons to robust PCP from both the simulations and practical 
data. 
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TABLE I 

Evaluations of low-rank matrix recovery of Robust PCA and Log-sum Heuristic Recovery. 







ran/c(A*) = 0.4m ||E*||f^ = O.lm'^ 


rank{A*) = 0.1m E*||^^ = 0.4m'^ 


m = n 


methods 


^"^A^If^ ranfc(A) E time(s) 


"^X^l^'^ ranfc(A) ||E||,„ tjme(s) 


200 


PCP 
LHR 


4.6e-l 103 21066 5.9 
8.1e-4 80 4000 12.7 


1.2e-l 107 23098 7.4 
1.3e-3 20 16031 14.1 


400 


PCP 
LHR 


4.3e-l 207 83954 26.1 
8.2e-4 160 16218 63.4 


7.0e-l 214 89370 35.7 
1.7e-3 40 47999 54.3 


800 


PCP 
LHR 


4.8e-l 414 336188 36.2 
9.9e-4 20 64283 91.7 


9.3e-2 348 355878 48.2 
2.1e-3 320 191998 108.2 



Algorithm 2: Optimization strategy of LHR for corrupted 
matrix recovery 



Corrupted matrix P and parameter A 



t 



1 E^- 



(1) 

Y{Z) 



Input 

Initialization 
1 repeat 

// Dynamically update the weight 
matrix . 



m(n) ■ 



3 
4 
5 
6 
7 
g 
9 
10 

11 
12 

13 
14 
15 
16 
17 
18 
19 



W^^ :=(|E(*-i)| + 5i)-i ; 
UEV^ = S'ra(A(*-i)); 

wi^) := (usu^ + <52i„0-i/2 ; 
w^.*) := (vi]v^ + <52i,0"'/' ; 

Reset Co > 0; > 0; p > 1; fc = 1; A'' = EO = 0; 
while not converged do 

// Variables updating. 



J'' = d„ 



Xfj, 



fe-1 



fc-i 



(h^+M-^C^)]; 
// Dual ascending. 



Ck — 1 

9 - — - 



2 T-pfcii2' 

k := k + 1,/ifc+i = p^J.k^, 

end 

(A(*),E(*)) = (A,,Efe); 
t:=t+l; 
20 until convergence; 

Output : (A(*),E(*)). 



1) General evaluation: We demonstrate the accuracy of 
the proposed LHR algorithm on randomly generated matrices. 
For an equivalent comparison, we adopted the same data 
generating method in [13] that all the algorithms are performed 
on the squared matrices and the ground-truth low rank matrix 
(rank r) with m x n entries, denoted as A* , is generated by 
independent random orthogonal model |13|; the sparse error 
E* is generated via uniformly sampling the matrix and the 
error values are randomly generate in the range [-100,100]. 
The corrupted matrix is generated by P = A* + E*, where 
A* and E* are the ground truth. For simplicity, we denote the 
rank rate as 77 = ''""'^(^ ) ^nd the error rate as f = ^l^n. 

For an equivalent comparison, we use the code in |38| to 



solve the PCP problem Q. HH indicated that PCP method 
could exactly recover a low rank matrix from corruptions 
within the region of 77 + ^ < 0.35. Here, in order to highlight 
the effectiveness of our LHR model, we directly consider much 
difficult tasks that we set 7] + £, ~ 0.5. Each experiment is 
repeated for ten times and the median values are tabulated in 
TableU] In the table, ^ denotes the recovery accuracy, 

rank denotes the rank of the recovered matrix A, ||E||fo 
is the card of the recovered errors and time records the 
computational costs (in seconds). 

From the results, obviously, compared with PCP, LHR 
model could exactly recover the matrix from higher ranks and 
denser errors. However, the table just provides two discrete 
tests. We will provide more thorough investigation in the next 
subsection. 

2) Feasible region: Since the basic optimization involves 
two terms, i.e., low rank matrix and sparse error. In this 
part, we will varies these two variables to test the feasible 
boundary of PCP and LHR, respectively. The experiments 
are conducted on the 400 x 400 matrices with sparse errors 
uniformly distributed in [—100, 100]. In the feasible region 
verification, when the recovery accuracy is larger than 1% (i.e., 
^^"y^'-^lF^ ^ 0.01), it is believed that the algorithm diverges. 
The two rates ry and ^ are varied from zero to one with the 
step of 0.025. On each test point, both the PCP and LHR 
are repeated for 10 times. If the median recovery accuracy is 
less than 1%, the point is regarded as the feasible point. The 
feasible regions of these two algorithms are shown in Fig |l(a)| 

From Fig |l(a)| the feasible region of LHR is much larger 
than the region of PCP. We get the same conclusion as made in 
lfT3l that the feasible boundary of PCP roughly fits the curve 
that r/P^^ + C^^^ = 0.35. The boundary of LHR is around 
the curve that + ^lhr ^ 0.575. Moreover, on the two 

sides of the red curve in Fig |l(a)[ the boundary equation can 
be even extended to r/^^^ + pLHR _ q g pj-Qjj^ j-jjjg j-ggj-^ jj- jg 

apparent that the proposed LHR algorithm covers a larger area 
of the feasible region, which implies that LHR could handle 
more difficult tasks that robust PCA fails to do. 

3) Convergence discussions: Finally, we will experimen- 
tally verify the convergence of the LHR. The experiments are 
conducted on 400 x 400 matrices with the rank equivalent 
to 40 and the portion of gross errors are set as 15%, 30% 

' In |34|, Lin et al. provided two .solvere, i.e. exact and inexact solvers, 
to solve the PCP problem. In this paper, we use the exact solver for PCP 
because it performs better than inexact solver 

-We do not use the average values here since in cases of divergence some 
extreme lai'ge outliers may greatly affect the average values of the accuracy. 



(a) Feasible region verification. 
Fig. 1. Feasible region and the convergence verifications. 



(b) Convergence verification. 



and 45%, respectively. The experimental results are reported 
in Fig |IV-B2] where the axis's coordinate denotes the iteration 
sequences, i.e. the count t in Algorithm[T] 

The top sub-figure in Fig lIV-B2] reports the time cost of each 
iteration. It is interesting to note that the denser the error is, 
the more time cost is required for one iteration. Besides, the 
most time consuming part occurs in the first iteration. During 
the first iteration, ( fTSI ) subjects to the typical PCP problem. 
However, in the second and the third iteration, the weight 
matrix is assigned with different values and thus it could make 
( fTST i converge with less iterations. Therefore, the time cost for 
each iteration is different in LHR. The first iteration needs 
many computational resources while the later ones can be 
further accelerated owing to the penalty of the weight matrix. 

The middle sub-figure records the stopping criterion, which 
is denoted as n„r/7uf — —■ It is believed that the LHR 
converges when the stopping criterion is less than le — 5. It 
is apparent from Fig lIV-B2] that the LHR could converge in 
just three iterations with 15% and 30% gross errors. While for 
the complicated case with 45% errors, LHR can converge in 
four steps. The bottom sub-figure shows the recovery accuracy 
after each iteration. It is obvious that the recovery accuracy 
increases significantly from the first iteration to the second 
one. Such an increase phenomenon verify the advantage of 
the reweighted approach derived from LHR. 

C. Practical applications 

PCP is a powerful tool for many practical applications. 
Here, we will conduct two practical applications to verify the 
effectiveness of PCP and LHR on real-world data. 

1) Shadow and specularities removal from faces: Following 
the framework suggested in llT3l . we stack the faces of the 
same subject under different lighting conditions as the columns 
in a matrix P. The experiments are conducted on extended 
Yale-B dataset where each face is with the resolutions of 
192 X 168. Then, the corrupted matrix P is recovered by 
PCP and LHR, respectively. After recovery, the shadows, 
specularities and other reflectance are removed in the error 
matrix (E) and the clean faces are accumulated in the low 
rank matrix (A). The experimental results are provided in 
Fig|2] where in each sub-figure from left to right are: original 




(b) Shadow texture 



Fig. 2. Shadow and specularities removal from faces (best viewed on screen). 



faces in Yale-B (left), faces recovered by PCP (median) and 
faces recovered by LHR (right), respectively. It is greatly 
recommended to enlarge the faces in Fig|2]to view the details. 
In Fig |2(a)[ when there exist dense shadows on the face 
image, the effectiveness of LHR becomes apparent to remove 
the dense shadows distribute on the left face. The dense 
texture removal ability is especially highlighted in Fig |2(b)[ 
where there are significant visual contrasts between the faces 
recovered by PCP and LHR. The face recovered by LHR is 
much clean. 

2} Video surveillance: The background modeling can also 
be categorized as a low rank matrix recovery problem, where 
the backgrounds correspond to the low rank matrix A and the 
foregrounds are removed in the error matrix E. We use the 
videos and ground truth in ||39l for quantitative evaluations. 
Three videos used in this experiment are listed in Figl3] 

For the sake of computational efficiency, we normalize each 
image to the resolutions of 120 x 160 and all the frames 
are converted to gray-sealer The benchmark videos used here 
contain too many frames which lead to a large matrix. It is 
theoretical feasible to use the two methods for any large matrix 
recovery. Unfortunately, for practical implementation, large 
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Video frame 


Ground truth l.MK 

(a) HW(439 frames). 


PCP 






II 


IS 


n 


Vidiii li iim 




Gruuiidlrulh l-HK 

(b) Lab(886 frames) 






55 


1 







(c) Seam(459 frames) 

Fig. 3. Benchmark videos for background modeling. In each sub-figure, 
from left to right are original video frames, foreground ground truth, LHR 
result and PCP result,respectively. 



matrices are always beyond the memory limitation of Matlab. 
Therefore, for each video, we uniformly divide the large matrix 
to be sub-matrices which has less than 200 columns. We 
recover these sub-matrices by setting A = respectively. 

The segmented foregrounds and the ground truth are shown 
in Fig 15] From the results we know that LHR could remove 
much denser errors from the corrupted matrix rather than PCP. 
Such claim is verified from three sequences in Figl3]that LHR 
makes much complete object recovery from the video. Besides, 
in Fig |3(c)[ it is also apparent that LHR only keeps dense 
errors in the sparse error term. In the seam sequences, there 
are obvious illumination changes in different frames. PCP is 
sensitive to these small variations and thus makes much more 
small isolated noise parts in the foreground. On the other hand, 
LHR is much robust to these local variations and only keeps 
dense corruptions in the sparse term. 

Although there are many advanced techniques for video 
background modeling, it is not the main concern of this 
work. Therefore, without the loss of generality, we use the 
Mixture of Gaussian (MoG) ll40l as the comparison baseline. 
For evaluation, both the false negative rate (FNR) and false 
positive rate (FPR) are calculated in the sense of foreground 
detection. These two scores exactly correspond to the Type 
I and Type II errors in machine learning, whose definitions 
may refer to BTI . FNR indicates the ability of the method 
to correctly recover the foreground and FPR represents the 
potential of a method on distinguishing the background. Both 
these two rates are judged by the criterion that the less the 
better. The experimental results are tabulated in table lIV-C2] 
We also report the time cost (in minutes) of PCP and LHR on 
these videos. But we omit the time cost of MoG since it can 
be finished in almost real time. 

From the results, PCP and LHR greatly outperform the per- 
formance of MoG. Moreover, LHR has lower FNRs than PCP 
which implies that LHR could better detect the foreground 
than PCP. However, on the video highway and seam, the FPR 
score of LHR is a little worse than PCP. One possible reason 
may ascribe to that there are too many moving shadows in 



TABLE II 

Quantitative evaluation of PCP and LHR for video 

SURVEILLANCE. 



Data 


False Negative Rate% 
MoG PCP LHR 


False Positive Rate% 
MoG PCP LHR 


Time(r?i) 
PCP LHR 


HW 


22.2 18.7 14.3 


8.8 7.8 8.4 


13.2 23.5 


Lab. 


15.1 10.1 8.3 


6.7 6.4 6.1 


25.4 43.7 


Seam 


23.5 11.3 9.2 


9.7 6.1 6.3 


11.4 19.9 



these two videos, where both the objects and shadows are 
regarded as errors. In the ground truth frames, the shadows 
are regarded as background. LHR could recover much denser 
errors form a low rank matrix and thus causes a comparable 
low FNR score. 

V. LHR FOR LOW RANK REPRESENTATION 

In this part, LHR will be applied to the task of low rank 
representation (LRR)|5||16| by formulating the constraint as 
P — PA+E, where the correlation affine matrix A is low rank 
and the noises in E are sparse. In the remaining parts of this 
section, we will first show how to use the joint optimization 
strategy to solve the LRR problem by LHR model. Then, 
two practical applications on motion segmentation and stock 
clustering will be presented and discussed. 

A. Joint optimization strategy of LHR for LRR 

When applying LHR to low rank representation, we should 
solve a sequence of convex optimizations in the form, 

A||Wb0E||. 



min. WWyAWzt 
s.t. P = PA + E 



(28) 



To make the nuclear norm trackable, we add an equality and 
tries to solve 



min.||J|L+A||Ws0E||,^ 
s.t. bi=P-PA-E = 

b2 = J - WyAWz = 



(29) 



Using the ADM strategy and following the similar derivations 
introduced in subsection IIV-AI we can solve the optimization 
in ( |29b and we directly provide the update rules for each 
variable in algorithm [3] To show LHR ideally represents low 



Algorithm 3: Update rule for the variables in 



2 J'^ = d^-i(W^^A' 

3 A"^ = 

A'=-i+7[W^^(b5;-+/i- 

4 // Dual ascending 



w 



-p^(b^ 



5 cf = ci 



k-1 



6 



Atfeb^; 



rank structures from data, experiments on subspace clustering 
are conducted on two datasets. First, we test LHR on slightly 
corrupted data-set, i.e., Hopkinsl56 motion database. Since the 
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effectiveness of LHR are especially emphasized on the data 
with great corruptions. We will also consider one practical 
application of using LHR for stock clustering. 

B. Motion segmentation in video sequences 

In this part, we apply LHR to the task of motion segmen- 
tation in Hopkins 155 dataset 122). Hopkins 155 database is a 
benchmark platform to evaluate general subspace clustering 
algorithms, which contains 156 video sequences and each of 
them has been summarized to be a matrix recoding 39 to 50 
data vectors. The primary task of subspace clustering is to 
categorize each motion to its corresponding subspace, where 
each video corresponds to a sole clustering task and it leads 
to 156 clustering tasks in total. 

For comparisons, we will compare LHR with LRR as 
well as other benchmark algorithms for subspace cluster- 
ing. The comparisons include Random Sample Consensus 
(RANSAOBSl. Generalized Principal Component Analysis 
(GPCA)|l451, Local Subspace Affinity (LSA), Locally Linear 
Manifold Clustering (LLMC) and Sparse Subspace Clustering 
(SSC). RANSAC is a statistic method which clusters data 
by iteratively distinguishing the data by inliers and outliers. 
GPCA presents an algebraic method to cluster the mixed data 
by the normal vectors of the data points. Manifold based 
algorithms, e.g. LSA and LLMC, assume that one point 
and its neighbors span as a linear subspace and they are 
clustered via spectral embedding. SSC assumes that the affine 
matrix between data are sparse and it segments the data via 
normalized cut[44|. 

In LRR 13, Liu et al. introduced two models that re- 
spectively used ti norm and ^2,1 norm to penalize sparse 
corruptions. In this paper, we will only report the results 
with the comparison to ^2,1 norm since it always performs 
better than li penalty in LRR. In order to provide a thorough 
comparison with LRR, we strictly follow the steps and the 
default parameter settings suggested in 15]. For LHR model, 
we choose parameter A — 0.4. In the experiments of LRR for 
motion segmentation, some post-processing are performed on 
the learned low rank structure to seek for the best clustering 
accuracy. For example, in LRR, after getting the representation 
matrix A, an extra PCP processing are implemented on A to 
enhance the low-rankness and such post-processing definitely 
increases SC accuracy. However, the main contribution of this 
work only focus LHR model on low-rank structure learning 
while not on the single task of subspace clustering. Therefore, 
we exclude all the post-processing steps to emphasize the 
effectiveness of the LRSL model itself. In our result, all the 
methods are implemented with the same criterion to avoid bias 
treatments. 

Hopkins 155 contains two subspace conditions in a video 
sequence, i.e., with two motions or three motions and thus 
we report the segmenting errors for two subspaces (TWO), 
three subspaces (THREE) and for both conditions (ALL) 
in Table lV-Bl From the results we know that sparse based 
methods generally outperform other algorithms for motion 
segmentation. Among three sparse methods, LHR gains the 
best clustering accuracy. However, the accuracy only has slight 



improvements on LRR. As indicated in Q, motion data only 
contains small corruptions and LRR could already achieve 
promising performance with the accuracy higher than 90%. 
With some post-processing implementations, the accuracy can 
even be further improved. Therefore, in order to highlight 
the effectiveness of LHR on low rank representation with 
corrupted data, some more complicated problems will be 
considered. 

TABLE III 

Motion segmentation errors (mean ) of several algorithms on 
THE Hopkins 155 motion segmentation database. 



Category 


Method 


TWO 


THREE 


ALL 


Algebraic 


GPCA 


11.2 


27.7 


14.2 


Statistic 


RANSAC 


8.9 


24.1 


12.5 


Manifold 


LSA 


8.7 


21.4 


11.6 




LLMC 


8.1 


20.8 


10.9 




SSC 


5.4 


15.3 


7.6 


Sparse 


LRR 


4.4 


14.9 


6.7 




LHR 


3.1 


13.9 


5.6 



C. Stock clustering 

It is not trivial to consider applying LHR model to more 
complicated practical data where the effectiveness of LHR on 
corrupted data will be over emphasized. In practical world, 
one of the most difficult data structures to be analyzed is 
the stock price which can be greatly affected by company 
news, rumors and global economic atmosphere. Therefore, 
data mining approaches of financial signals have been proven 
to be very difficult but on the other hand, it is very profitable. 

In this paper, we will discuss how to use the LRR and 
LHR model to the interesting, albeit not very lucrative, task of 
stock clustering based on their industrial categories. In many 
stock exchange centers around the world, stocks are always 
divided into different industrial categories. For example, on 
the New York Stock Exchange Center, IBM and J.P.Morgan 
are respectively categorized into the computer based system 
category and money center banks category. It is generally 
assumed that stocks in the same category always have similar 
market performance. This basic assumption is widely used by 
many hedge funds for statistic arbitrage. In this paper, we 
consider that stocks in the same industrial category span as 
a subspace and therefore the goal of stock clustering, a.k.a. 
stock categorization, is to identify a stock's industrial label by 
its historical prices. 

The experiments are conducted on stocks from two global 
stock exchange markets in New York and Hong Kong. In 
each market, we choose 10 industrial categories which have 
the largest market capitalizations. The categories divided by 
the exchange centers are used as the ground truth label. 
In each category, we only choose the stocks whose market 
capitalizations are within the top 10 ranks in one category. 
The stock prices on New York market are obtained from 
Ii45j and the stock prices in Hong Kong market are obtained 
from 1461 . Unfortunately, some historical prices for stocks in 
P31 are not recorded and providecfl Therefore, for the US 

^For example, in the industrial category of Drug Manufactures, it is not 
possible to get the historical data of CIPILA.LTD from t45i which is the 
only the interface for us to get the stock prices in US. 
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TABLE IV 

Clustering errors of the stocks in ten categories from New 
York and Hong Kong markets. 



Markets 


GPCA 


RANSAC 


LSA 


LLMC 


New York 


60.1 


59.3 


51.7 


54.3 


Hong Kong 


57.3 


55.8 


54.7 


53.7 


Markets 


ssc 


LRR 


LHR 




New York 


48.6 


44.1 


36.2 




Hong Kong 


49.1 


46.5 


38.3 





market, we accumulated 76 stocks divided into 10 classes and 
each class contains 7 to 9 stocks; for Hong Kong market, 
we obtain 96 stocks spanning 10 classes. For classification, 
the weekly closed prices from 07/01/2008 to 31/10/2011 
including 200 weeks, are used because financial experts always 
look at weekly close prices to judge the long-term trend of a 
price. 

As stated previously, the stock prices may have sharp drop 
and up which cause outliers in the raw data. Besides, the prices 
of different stocks are various that cannot be evaluated with the 
same quantity scaler For the ease of pattern mining, we use 
the time-based normalization strategy suggested in 1.47 J 1,48,1 to 
pre-process the stock prices: 



P{t) - Hajt) 
(7a{t) 



where p{t) is the price of a certain stock at time t , /iQ(t) 
and aa{t) are respectively the average value and standard 
derivation of the stock prices in the interval [t — a,t]. We 
plot the normalized stock prices of three categories in Fig|4] 
After normalization, we further adopt PCA method to reduce 
the dimensions of stocks from M^"*' to M^. Theoretically, the 
rank of subspaces after PCA should be 10 — 1 = 9 because it 
contains 10 subspaces and the rank is degraded by 1 during 
PCA implementation. But, in the simulation, we find that the 
maximal clustering accuracies for both markets are achieved 
with the PCA dimensions of 5. 



Areospace&Derense 




20 40 



80 200 



80 200 



Fig. 4. Normalized stock prices in NY of the categories: Are- 
ospace&Defense,Banks and Wireless communication. In each category, hnes 
in different colors represent different stocks, (best viewed on screen) 



The clustering errors of different SC methods on the stocks 
from these two markets are summarized in TablejIVI From 



the results, it is obvious that LHR significantly outperforms 
other methods. It improves statistic and graph based methods 
for about 20%. Among all the sparse methods, LHR makes 
improvements on LRR for about 8%. Although LHR performs 
the best among all the methods, the clustering accuracy is 
only about 63% and 61% on US and Hong Kong markets, 
respectively. The clustering accuracy is not as high as those on 
the motion data. This may be ascribe to that the raw data and 
ground truth label themselves contain many uncertainties. See 
the bottom sub-figure in Fig|4] for the stocks in the wireless 
communication category, the normalized stock marked with 
the green color performs quite different from other stocks 
in the same category. But the experimental results reported 
here is sufficient to verify the effectiveness of subspace clus- 
tering for 10 classes categorization. If no intelligent learning 
approaches were imposed, the expected accuracy may be only 
10%. Although with such "bad" raw data, the proposed LHR 
could achieve the accuracy as high as 62% in a definitely 
unsupervised way. 



VI. CONCLUSION 

This paper presents a log-sum heuristic recovery algorithm 
to learn the essential low rank structures from corrupted 
matrices. We introduced a MM algorithm to convert the non- 
convex objective function a series of convex optimizations 
via reweighed approaches and proved that the solution may 
converge to a stationary point. Then, the general model was 
applied to two practical tasks of LRMR and SC. In both of the 
two models, we gave the solution/update rules to each variable 
in the joint optimizations via ADM. For the general PCP 
problem, LHR extended the feasible region to the boundary 
of ?7 + ^ = 0.58. For SC problem, LHR achieved state-of-the- 
art results on motion segmentation and achieved promising 
results on stock clustering which contain too many outliers 
and uncertainties. However, a limitation of the proposed LHR 
model is for the reweighted phenomenon that requires to solve 
convex optimizations for multiple times. The implementations 
of LHR is a bit more time consuming than PCP and LRR. 
Therefore, LHR model is especially recommend to learn the 
low rank structure from data with denser corruptions and 
higher ranks. 



Appendix 



A. ABBREVIATIONS 



ADM: Alternating Direction Method, MM: Majoriza- 
tion Minimization, GPCA:Generalized Principal Component 
Analysis, PCP: Principal Component Pursuit, LHR:Log-sum 
Heuristic Recovery, LLMC: Locally Linear Manifold Cluster- 
ing, LRMR: Low Rank Matrix Recovery, LRR:Low rank rep- 
resentation, LRSL: Low Rank Structure Learning, LSA: Lo- 
cal Subspace Affinity,MoG: Mixture of Gaussian, RANSAC: 
Random Sample Consensus, RPCA:Robust Principal Compo- 
nent analysis, SC:Subspace Clustering, SSC: Sparse Subspace 
Clustering. 
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B. Convergence of subsequences in the proof of lemma \III.4\ 

In this part, we provide the discussions about the properties 
of the convergent subsequences that are used in the proof of 
Lemma IIII.4I 

Since sequence X* = {Y*, Z*, A*, E*} is generated via 
Eq|6] we know that X E D strictly holds. Therefore, all the 
variables (i.e. Y*,Z*,A*,E*) in set X should be bounded. 
This claim can be easily verified because that if any variable 
in the set X goes to infinity, the constraints in domain D will 
not be satisfied. Accordingly, we know that sequence X* is 
bounded. According to the Bolzano-Welestrass Theorem |49|, 
we know that every bounded sequence has a convergent sub- 
sequence. Since X*- is bounded, it is apparent that there exists 
a convergent subsequence A**. Without the loss of generality, 
we can construct another subsequence A''=+^ which is also 
convergent. The proof of the convergence of A*''+^ relies on 
the monotonically decreasing property proved in Lemma UlI. 31 
Since H{-) is monotonically decreasing, it is easy to check 
thati?(A*'») > iJ(A*'=+i) > iJ(A*'=+i) > i7(A*'=+i+i) > 
_ff(A*'=+i+i). According to the above inequalities, we get that, 

lim i7(A*'=) > lim H{X*''+^) > lim i7(A*'=+^) (30) 

k~>oo k^oo k-^oc 

Since subsequence A**" converges, it is obvious that 
lim iJ(A*'=) = lim H{X^''+^) = (3. According to 

the famous Squeeze Theorem 11501 . from ( [30l l, we get the 
lim H{X^^+^) ^ H{ lim A*'=+i) = /3. Since the function 

H{-) is monotonically decreasing and X is bounded, the 
convergence of iJ(A*''+^) can be obtained if and only if the 
subsequence A*''+^ is convergent. 
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